Automatically Identifying Calling-Prone Higher-Order Functions of Scala Programs to Assist Testers
Published in Journal of Computer Science and Technology (2020), 2020
Abstract
For the rapid development of internetware, functional programming languages, such as Haskell and Scala, can be used to implement complex domain-specific applications. In functional programming languages, a higher-order function is a function that takes functions as parameters or returns a function. Using higher-order functions in programs can increase the generality and reduce the redundancy of source code. To test a higher-order function, a tester needs to check the requirements and write another function as the test input. However, due to the complex structure of higher-order functions, testing higher-order functions is a time-consuming and labor-intensive task. Testers have to spend an amount of manual effort in testing all higher-order functions. Such testing is infeasible if the time budget is limited, such as a period before a project release. In practice, not every higher-order function is actually called. We refer to higher-order functions that are about to be called as calling-prone ones. Calling-prone higher-order functions should be tested first. In this paper, we propose an automatic approach, namely Phof, which predicts whether a higher-order function of Scala programs will be called in the future, i.e., identifying calling-prone higher-order functions. Our approach can assist testers to reduce the number of higher-order functions of Scala programs under test. In Phof, we extracted 24 features from source code and logs to train a predictive model based on known higher-order function calls. We empirically evaluated our approach on 4 832 higher-order functions from 27 real-world Scala projects. Experimental results show that Phof based on the random forest algorithm and the Synthetic Minority Oversampling Technique Processing strategy (SMOTE) performs well in the prediction of calls of higher-order functions. Our work can be used to support the scheduling of limited test resources.